Using syntactic features to predict author personality from text

نویسندگان

  • Kim Luyckx
  • Walter Daelemans
چکیده

The style in which a text is written re ects an array of meta-information concerning the text (e.g., topic, register, genre) and its author (e.g., gender, region, age, personality). The eld of stylometry addresses these aspects of style. A successful methodology, borrowed from text categorisation research, takes a two-stage approach which (i) achieves automatic selection of features with high predictive value for the categories to be learned, and (ii) uses machine learning algorithms to learn to categorize new documents by using the selected features (Sebastiani, 2002). To allow the selection of linguistic features rather than (n-grams of) terms, robust and accurate text analysis tools are necessary. Recently, language technology has progressed to a state of the art in which the systematic study of the variation of these linguistic properties in texts by different authors, time periods, regiolects, genres, registers, or even genders has become feasible.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Author gender identification from text using Bayesian Random Forest

Nowadays high usage of users from virtual environments and their connection via social networks like Facebook, Instagram, and Twitter shows the necessity of finding out shared subjects in this environment more than before. There are several applications that benefit from reliable methods for inferring age and gender of users in social media. Such applications exist across a wide area of fields,...

متن کامل

Syntactic N-grams as Features for the Author Profiling Task: Notebook for PAN at CLEF 2015

This paper describes our approach to tackle the Author Profiling task at PAN 2015. Our method relies on syntactic features, such as syntactic based n-grams of various types in order to predict the age, gender and personality traits that has the author of a given text. In this paper, we describe the used features, the employed classification algorithm, and other general ideas concerning the expe...

متن کامل

زیبایی شناسی خطبۀ آفرینش در پرتو نقد فرمالیستی

The present study is an attempt to investigate the first sermon of NahjulBalāgheh in the light of the formalistic criticism. It deals with the issue that the religious meaning and content of the creation sermon should be considered as a foundation, and how much the form and integration have been regarded, and to what amount the literary text has been observed by the author. Ignoring the non-tex...

متن کامل

Personae: a Corpus for Author and Personality Prediction from Text

We present a new corpus for computational stylometry, more specifically authorship attribution and the prediction of author personality from text. Because of the large number of authors (145), the corpus will allow previously impossible studies of variation in features considered predictive for writing style. The innovative meta-information (personality profiles of the authors) associated with ...

متن کامل

A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008